Overview

Dataset info

Number of variables5
Number of observations1000163
Missing cells0 (0.0%)
Duplicate rows261831 (26.2%)
Total size in memory146.1 MiB
Average record size in memory153.2 B

Variables types

NUM3
CAT2

Reproduction info

Date of analysis2020-01-24 03:09:10.255585
Versionpandas-profiling v2.4.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download Configurationconfig.yaml

Warnings

Dataset has 261831 (26.2%) duplicate rows Warning
geolocation_city has a high cardinality: 8011 distinct values Warning

Variables

geolocation_city
Categorical

HIGH CARDINALITY
Distinct count8011
Unique (%)0.8%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
sao paulo
 
135800
rio de janeiro
 
62151
belo horizonte
 
27805
são paulo
 
24918
curitiba
 
16593
Other values (8006)
732896
ValueCountFrequency (%) 
sao paulo 135800 13.6%
 
rio de janeiro 62151 6.2%
 
belo horizonte 27805 2.8%
 
são paulo 24918 2.5%
 
curitiba 16593 1.7%
 
porto alegre 13521 1.4%
 
salvador 11865 1.2%
 
guarulhos 11340 1.1%
 
brasilia 10470 1.0%
 
sao bernardo do campo 8112 0.8%
 
Other values (8001) 677588 67.7%
 

Composition

Contains charsTrue
Contains digitsTrue
Contains whitespaceTrue
Contains non-wordsTrue

Length

Max length38
Mean length10.46826467
Min length2
Scatter

geolocation_lat
Real number (ℝ)

Distinct count717351
Unique (%)71.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-21.17615291
Minimum-36.60537441
Maximum45.06593318
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB
Mini histogram

Quantile statistics

Minimum-36.60537441
5-th percentile-28.975746
Q1-23.60354554
median-22.91937749
Q3-19.97962034
95-th percentile-7.673935632
Maximum45.06593318
Range81.67130759
Interquartile range (IQR)3.623925201

Descriptive statistics

Standard deviation5.715866309
Coefficient of variation (CV)-0.2699199582
Kurtosis2.850097461
Mean-21.17615291
Median Absolute Deviation (MAD)3.970906813
Skewness1.565146669
Sum-21179604.62
Variance32.67112766
Histogram
Histogram with fixed size bins (bins=10)
Histogram
Histogram with variable size bins (bins=[-36.60537441 -33.69256 -33.69143423 -33.6914104 -33.68381703 ... 3.84898001 4.47635114 4.48182209 38.29607193 45.06593318], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-27.102099 314 < 0.1%
 
-23.49590147 190 < 0.1%
 
-23.50604921 141 < 0.1%
 
-23.49061751 127 < 0.1%
 
-23.00551426 102 < 0.1%
 
-22.96590556 89 < 0.1%
 
-23.00458225 89 < 0.1%
 
-15.84145095 85 < 0.1%
 
-23.53718574 83 < 0.1%
 
-23.49189491 82 < 0.1%
 
Other values (717341) 998861 99.9%
 
ValueCountFrequency (%) 
-36.60537441 1 < 0.1%
 
-36.60383679 2 < 0.1%
 
-34.62239972 1 < 0.1%
 
-34.58642211 1 < 0.1%
 
-33.69261619 1 < 0.1%
 
ValueCountFrequency (%) 
45.06593318 1 < 0.1%
 
43.68496097 1 < 0.1%
 
42.43928592 1 < 0.1%
 
42.42888406 1 < 0.1%
 
42.18400274 2 < 0.1%
 

geolocation_lng
Real number (ℝ)

Distinct count717612
Unique (%)71.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-46.39054132
Minimum-101.4667664
Maximum121.1053938
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB
Mini histogram

Quantile statistics

Minimum-101.4667664
5-th percentile-53.22031459
Q1-48.57317218
median-46.63787867
Q3-43.7677088
95-th percentile-38.50438615
Maximum121.1053938
Range222.5721603
Interquartile range (IQR)4.805463378

Descriptive statistics

Standard deviation4.269748307
Coefficient of variation (CV)-0.09203919991
Kurtosis4.727050625
Mean-46.39054132
Median Absolute Deviation (MAD)3.013916346
Skewness-0.1024168496
Sum-46398102.98
Variance18.2307506
Histogram
Histogram with fixed size bins (bins=10)
Histogram
Histogram with variable size bins (bins=[-101.46676645 -72.92902103 -72.68578078 -72.64942799 -71.69276386 ... -47.88993189 -47.88993106 -47.888856 -47.888856 121.10539381], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-48.6296135 314 < 0.1%
 
-46.8746867 190 < 0.1%
 
-46.7173774 141 < 0.1%
 
-46.86900367 127 < 0.1%
 
-43.37596441 102 < 0.1%
 
-46.54629394 91 < 0.1%
 
-43.3899987 89 < 0.1%
 
-43.31989932 89 < 0.1%
 
-48.02402569 85 < 0.1%
 
-46.59403568 83 < 0.1%
 
Other values (717602) 998852 99.9%
 
ValueCountFrequency (%) 
-101.4667664 1 < 0.1%
 
-98.48412075 1 < 0.1%
 
-98.07854439 1 < 0.1%
 
-98.0785331 1 < 0.1%
 
-72.93074575 1 < 0.1%
 
ValueCountFrequency (%) 
121.1053938 1 < 0.1%
 
13.8202141 1 < 0.1%
 
9.34152763 1 < 0.1%
 
-4.947823289 1 < 0.1%
 
-6.328200272 1 < 0.1%
 
Distinct count27
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
SP
404268
MG
126336
RJ
121169
RS
 
61851
PR
 
57859
Other values (22)
228680
ValueCountFrequency (%) 
SP 404268 40.4%
 
MG 126336 12.6%
 
RJ 121169 12.1%
 
RS 61851 6.2%
 
PR 57859 5.8%
 
SC 38328 3.8%
 
BA 36045 3.6%
 
GO 20139 2.0%
 
ES 16748 1.7%
 
PE 16432 1.6%
 
Other values (17) 100988 10.1%
 

Composition

Contains charsTrue
Contains digitsFalse
Contains whitespaceFalse
Contains non-wordsFalse

Length

Max length2
Mean length2
Min length2
Scatter

geolocation_zip_code_prefix
Real number (ℝ≥0)

Distinct count19015
Unique (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36574.16647
Minimum1001
Maximum99990
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB
Mini histogram

Quantile statistics

Minimum1001
5-th percentile3220.1
Q111075
median26530
Q363504
95-th percentile91750
Maximum99990
Range98989
Interquartile range (IQR)52429

Descriptive statistics

Standard deviation30549.33571
Coefficient of variation (CV)0.8352708664
Kurtosis-0.9412260072
Mean36574.16647
Median Absolute Deviation (MAD)26010.85282
Skewness0.6944941944
Sum3.658012806e+10
Variance933261912.3
Histogram
Histogram with fixed size bins (bins=10)
Histogram
Histogram with variable size bins (bins=[ 1001. 1001.5 1013.5 1014.5 1019.5 ... 99905. 99927.5 99945. 99951. 99990. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
24220 1146 0.1%
 
24230 1102 0.1%
 
38400 965 0.1%
 
35500 907 0.1%
 
11680 879 0.1%
 
22631 832 0.1%
 
30140 810 0.1%
 
11740 788 0.1%
 
38408 773 0.1%
 
28970 743 0.1%
 
Other values (19005) 991218 99.1%
 
ValueCountFrequency (%) 
1001 26 < 0.1%
 
1002 13 < 0.1%
 
1003 17 < 0.1%
 
1004 22 < 0.1%
 
1005 25 < 0.1%
 
ValueCountFrequency (%) 
99990 2 < 0.1%
 
99980 26 < 0.1%
 
99970 21 < 0.1%
 
99965 6 < 0.1%
 
99960 5 < 0.1%
 

Correlations

Missing values

Sample

First rows

geolocation_citygeolocation_latgeolocation_lnggeolocation_stategeolocation_zip_code_prefix
0sao paulo-23.545621-46.639292SP1037
1sao paulo-23.546081-46.644820SP1046
2sao paulo-23.546129-46.642951SP1046
3sao paulo-23.544392-46.639499SP1041
4sao paulo-23.541578-46.641607SP1035
5são paulo-23.547762-46.635361SP1012
6sao paulo-23.546273-46.641225SP1047
7sao paulo-23.546923-46.634264SP1013
8sao paulo-23.543769-46.634278SP1029
9sao paulo-23.547640-46.636032SP1011

Last rows

geolocation_citygeolocation_latgeolocation_lnggeolocation_stategeolocation_zip_code_prefix
1000153ciriaco-28.343273-51.873734RS99970
1000154tapejara-28.070493-52.011342RS99950
1000155agua santa-28.180655-52.034367RS99965
1000156tapejara-28.072188-52.011272RS99950
1000157tapejara-28.068864-52.012964RS99950
1000158tapejara-28.068639-52.010705RS99950
1000159getulio vargas-27.877125-52.224882RS99900
1000160tapejara-28.071855-52.014716RS99950
1000161david canabarro-28.388932-51.846871RS99980
1000162tapejara-28.070104-52.018658RS99950